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^ ■ Abstract 

^ ^ ■ We describe a scalable parallelization of Geant4 using commodity hardware in a collab- 

orative effort between the College of Computer Science and the Department of Physics at 
Northeastern University. The system consists of a Beowulf cluster of 32 Pentium II proces- 
sors with 128 MBytes of memory each, connected via ATM and fast Ethernet. The bulk of the 
parallelization is done using TOP-C (Task Oriented Parallel C), software widely used in the 
I computational algebra community. TOP-C provides a flexible and powerful framework for 

■ parallel algorithm development, is easy to learn, and is available at no cost. Its task oriented 

nature allows one to parallelize legacy code while hiding the details of interprocess com- 
munications. Applications include fast interactive simulation of computationally intensive 
I processes such as electromagnetic showers. General results motivate wider applications of 

i TOP-C to other simulation problems as well as to pattern recognition in high energy physics. 

o 
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p • 1 Introduction 

' Among the most CPU-consuming tasks in high energy physics experiments are the detailed sim- 

ulations of how detectors respond to high energy particles. Even today, many physics results are 
^ i given with contributions to the error due to the finite amount of Monte Carlo data available, and 

^ [ in many cases this error is comparable to and even larger than other errors. Even in such large 

and well-funded experiments as those at LEP, Monte Carlo statistics is a large component of the 
eiTor in precision electroweak measurements. In addition to its importance in the analysis of data, 
Monte Carlo simulation is needed at all stages of the design of experiments in order to understand 
and optimize the detector design, as well as to develop a good grasp of the basic physics issues. In 
this paper we present the first results of an ongoing program to use commodity computing to pro- 
vide parallel computing for extremely fast Monte Carlo simulations. The aim is to go beyond the 
simple event-level parallelism which is commonly used today and actually run individual events 
through Geant4 faster than would be possible on any single workstation or PC. The work has 
important applications not only for large scale production, but for the rapid turnaround of ideas 
and designs for the working physicist - the difference between waiting a few minutes and a few 
seconds for an event to be simulated and viewed, for example, makes a world of difference for an 
interactive user. 
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2 Geaiit4 



Geant4 [|lj, ^] is the latest stage in the development of the GEANT software, superseding the earlier 
FORTRAN versions with a new object-oriented approach in C++. For a variety of reasons, in no 
small part driven by the wish to work with software which is likely to see use in the near future, 
we decided to try to parallelize Geant4. The aim was to achieve a granularity finer than would be 
achieved by simply farming out separate events to separate CPU's and collecting the results. The 
approach taken was to perturb the existing software as little as possible and to modify a section of 
the code which handles particle tracking and interaction (a frequent operation) to allow it to run 
on multiple CPU's using TOP-C. 

3 Parallelization of Geaiit4 Using Task-Oriented Parallel C (TOP-C) 

TOP-C (Task Oriented Parallel C) was initially designed with the twin goals of easily writing 
parallel applications and with the ability to tolerate the high latency typically found on Beowulf 



clusters. It is freely available at f tp : //f tp . ccs . neu . edu/pub/people/gene/topc/ The 



same application source code has been run under shared and distributed memory (SMP, IBM SP- 
2, NoW, Beowulf cluster). A sequential TOP-C library is also provided to ease debugging. The 
largest example to date was a computer construction of Janko's group over three months using 
approximately 100 nodes of an IBM SP-2 parallel computer at Cornell University [Q]. 

The TOP-C programmer's model is a master-slave architecture based on three key con- 
cepts: 

1. tasks in the context of a master/slave architecture; 

2. global shared data with lazy updates; and 

3. actions to be taken after each task. 

Task descriptions (task inputs) are generated on the master, and assigned to a slave. The slave 
executes the task and returns the result to the master. The master may update shared data on all 
processes. Such global updates take place on each slave after the slave completes its current task. 
The programmer's model for TOP-C is graphically described below. 

The task-oriented approach of TOP-C is ideally suited to parallelizing legacy applications. 
We chose the TOP-C task to be computation of a particle track in Geant4. The largest difficulty 
was in marshalling and unmarshalling the C++ G4track objects that had to be passed to the slave 
processes. Marshalling is the process by which one produces a representation of an object in a 
contiguous buffer suitable for transfer over a network, and unmarshalling is the inverse process. 

We developed a 6-step software methodology to allow ourselves to incrementally parallelize 
Geant4, allowing us to isolate individual issues. The six steps were: 

1. the use of .ice (include) files to isolate our code from the original Geant4 code; 

2. collecting the code of the inner loop in a separate routine, DoTaskO, whose input was a 
primary particle track, and whose output was the primary and its secondary particles; 

3. marshalling and unmarshalling the C++ objects for particle tracks {gdb, a symbolic de- 
bugger, and etags an emacs facility for a source code browser, were invaluable here for 
inspecting the internals of the objects); 

4. integrating the marshalled versions of the particle tracks with the calls to DoTask ( ) ; 

5. adding TOPC_init(), TOPC_submit_task_input, and other routines and then testing as 
the marshalled particle tracks were sent across the network; 

6. and finally adding Che ckTaskRe suit (), which inspected the task output, and added the 
secondary tracks to the stack, for later processing by other slave processes. 
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TOP-C Programmer's Model 
(Life Cycle of a Task) 



Prior to the fifth step, all debugging was in a sequential setting. The maturity of the TOP-C 
library then allowed us to create fully functioning parallel code in less than a day 



4 A Test Simulation of Extensive Air Showers 



We describe here a simple test of the parallelized Geant4 code described above. So far we have 
confined work to electromagnetic calorimetry, one of the most time-consuming, yet physically 
understandable tasks to simulate. 

With the observation of ultrahigh energy cosmic -ray induced air showers initiated by pri- 
maries carrying over 10^*^ eV [||, there is a growing interest in better-modelling particle interac- 
tions [^. The currently most popular programs ^ ^ use dedicated particle transport and interac- 
tion codes to perform the simulation. Invariably they contain approximations in order to make the 
code run in a reasonable time, but these approximations must at some point be tested against our 
best models of physics. In addition, these programs can lack the flexibility of a general-purpose 
program like Geant4. To this end, we consider the modelling of ultrahigh energy air showers in- 
duced by gamma rays, for now taking into account only electromagnetic interactions. Inclusion 
of hadronic interactions is underway, pending a better understanding of how to handle them at 
ultrahigh energies using Geant4. 

In the case we consider here, the description of the calorimeter is moderately complicated. 
The atmosphere is defined by a stack of 230 layers of increasing thickness and decreasing density 
with the height above sea level. The layer thicknesses start at 50 m (sea level) and at higher 
altitudes are as thick as 1 km. The variable density was modeled using Linsley's parametrization 
of the U.S. Standard Atmosphere 

Preliminary comparisons with the serial version of the code show excellent agreement, and 



comparisons with other shower simulation codes are underway. 



5 Conclusions 



Geant4 (approximately 100,000 lines of C++ code) was successfully parallelized using TOP-C. 
This was done despite the fact that none of our group had prior experience with Geant4. It remains 
to obtain timing tests on a long run with many processors. Initial results for the example described 
indicate that a single task in our example requires approximately 1 ms of CPU time. Hence, it will 
be essential to submit approximately 100 particles for a single slave process to compute, in order 
to overcome network overhead. Optimization of the parallel implementaion is underway and we 
are also interested in collaboration with other groups who may have needs for the speedups that 
our methodology offers. 

Task Oriented Parallel C seems to be well-suited to the problem of parallelizing Geant4, 
and would likely be well-suited to other high energy physics applications as well. Its flexibility 
and simplicity makes it possible to envision enormous speedups for Geant4 within a single event, 
something not often considered in high energy experiments, but offering many advantages over the 
usual, trivial parallelism, especially during interactive data analysis and code or hardware design. 
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